A Ground Truth Inference Model for Ordinal Crowd-Sourced Labels Using Hard Assignment Expectation Maximization
نویسندگان
چکیده
In this paper we propose an iterative approach for inferring a ground truth value of an item from judgments collected form online workers. The method is specifically designed for cases in which the collected labels are ordinal. Our algorithm works by iteratively solving a hard-assignment EM model and later calculating one final expected value after the convergence of the EM procedure.
منابع مشابه
LabelBoost: An Ensemble Model for Ground Truth Inference Using Boosted Trees
We introduce LabelBoost, an ensemble model that utilizes various label aggregation algorithms to build a higher precision algorithm. We compare this algorithm with majority vote, GLAD and an Expectation Maximization model on a publicly available dataset. The results suggest that by building an ensemble model, one can achieve higher precision value for aggregating crowd-sourced labels for an ite...
متن کاملAnnotation models for crowdsourced ordinal data
In supervised learning when acquiring good quality labels is hard, practitioners resort to getting the data labeled by multiple noisy annotators. Various methods have been proposed to estimate the consensus labels for binary and categorical labels. A commonly used paradigm to annotate instances when the labels are inherently subjective is to use ordinal scales. In this paper we propose annotato...
متن کاملImproving Genre Annotations for the Million Song Dataset
Any automatic music genre recognition (MGR) system must show its value in tests against a ground truth dataset. Recently, the public dataset most often used for this purpose has been proven problematic, because of mislabeling, duplications, and its relatively small size. Another dataset, the Million Song Dataset (MSD), a collection of features and metadata for one million tracks, unfortunately ...
متن کاملQuality Control of Crowd Labeling through Expert Evaluation
We propose a general scheme for quality-controlled labeling of large-scale data using multiple labels from the crowd and a “few” ground truth labels from an expert of the field. Expert-labeled instances are used to assign weights to the expertise of each crowd labeler and to the difficulty of each instance. Ground truth labels for all instances are then approximated through those weights along ...
متن کاملEstimation of Discourse Segmentation Labels from Crowd Data
For annotation tasks involving independent judgments, probabilistic models have been used to infer ground truth labels from data where a crowd of many annotators labels the same items. Such models have been shown to produce results superior to taking the majority vote, but have not been applied to sequential data. We present two methods to infer ground truth labels from sequential annotations w...
متن کامل